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DETERMINING  THE  STABILITY  OF  A  MEAN  ESTIMATE  FROM 
CORRELATED  SAMPLES  BY  USE  OF  LINEAR  PREDICTION 


INTRODUCTION 

When  a  collection  of  N  data  samples  of  a  stationary  process  are  taken,  a 
common  method  of  estimating  the  mean  of  the  random  process  is  to  use  the 
sample  mean  of  the  available  data.  If  the  data  samples  are  taken  with  a  time 
increment  large  enough  that  they  are  linearly  independent  of  each  other,  that 
is,  uncorrelated  samples,  then  the  variance  of  the  mean  estimate  is  equal  to 
the  actual  (unknown)  variance  of  the  process  divided  by  N.  In  this  case,  the 
quality  of  the  mean  estimate  can  be  approximated  by  also  estimating  the 
process  variance,  that  Is,  by  computing  the  sample  variance  of  the  available 
data. 


However,  when  the  time  increment  Is  small  enough  that  the  data  samples 
are  statistically  dependent  on  each  other,  the  variance  of  the  mean  estimate 
depends  on  the  (unknown)  covariance  of  the  given  process.  Thus,  in  order  to 
determine  the  stability  of  a  first-order  statistic  (mean),  we  need  information 
on  a  second-order  statistic  (covariance  or  spectrum).  Since  this  latter 
Information  is  unknown,  we  also  need  to  estimate  it  from  the  available  data. 
Three  procedures  for  accomplishing  this  goal  will  be  discussed  here. 
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FUNDAMENTAL  EQUATIONS 


DEFINITIONS 

Suppose  that  real  stationary  random  process  x(t)  has  mean 

m  =  x(t)  (1) 

and  covariance  function 


RX(D  =  [x(t)  -  m][x(t  -t)  -  m]  .  (2) 

(An  overbar  denotes  an  ensemble  average.)  The  spectrum  of  process  x(t)  is* 

Gx(f)  =  jdr  exp(-12irfT)  Rx(T)  .  (3) 

Neither  Rx  nor  Gx  contain  any  information  about  the  mean  value,  m,  of  the 
process.  We  presume  that  m,  Rx(T),  and  Gx( f )  are  all  unknown. 


SAMPLING  PROCEDURE 

Suppose  that  N  data  samples, 

x{na}  for  1  <  n  <  N  ,  (4) 


♦Integrations  and  summations  without  limits  are  over  the  complete  range  of 
nonzero  integrands  and  summands,  respectively. 


are  taken  of  process  x(t),  where  time  increment  a  might  be  small  enough  that 
the  samples  are  correlated;  that  is, 


Rx(na)  *  0  for  n  =  +1 ,  +2 . 


(5) 


We  estimate  the  actual  (unknown)  mean  m  according  to 

N 

m  =  ^  x(nA)  =  2  w(n)  x(nA)  ’ 

n=l  n 

» 

i 

where  (for  notational  convenience)  weights 


w(n) 


for 
otherwi 


1  <  n  < 
wise  J 


Now,  the  estimate  m  in  (6)  is  an  unbiased  estimate  of  m,  because 


(6) 


(7) 


N 

m  =  1  x(na)  =  m  ,  (8) 

n=l 

upon  use  of  (1).  But  how  good  an  estimate  is  m  of  m;  that  is,  what  is  the 
standard  deviation  of  random  variable  m?  This  information  is  needed  in  order 
to  establish  reasonable  confidence  limits  about  a  particular  value  of  an 
estimate,  m. 

To  illustrate  the  wide  range  of  possible  values  for  the  stability  of 
random  variable  m,  consider  the  two  limiting  cases  where  (a)  the  samples 
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{x(naj}  are  all  linearly  Independent  (uncorrelated)  of  each  other,  or 
alternatively,  (b)  completely  statistically  dependent  on  each  other.  As  shown 
below,  the  variance  of  m  Is 


*vi 


var(fn)  = 


a%/ N  for  (a) 


for  (b) 


where  a is  the  (unknown)  variance  of  process  x(t).  These  extreme  values 
differ  by  a  factor  of  N,  the  number  of  available  data  samples.  In  order  to 
determine  just  where  in  this  range  the  variance  of  estimate  m  lies  for  a 
particular  application,  more  information  must  be  extracted  from  the  available 
data  {x(naj}. 


VARIANCE  OF  ESTIMATE  m 


To  determining  the  variance  of  m,  consider  the  difference  random  variable 


m  -  m 


^  [x(nA)  -  m]  =  ^  w(n)  [x(nA)  -  m]  . 


Then  the  variance  of  mean  estimate  m  is 


v  =  var(m)  =  (m  -  m)2  = 


w(n)  w(p)  [x(nA)  -  m][x(pA)  -  m]  = 


=  w(n)  w(p)  Rx(nA  ”  PA)  =  *>(k)  Rx(kA)  , 
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where  we  used  covariance  (2)  and  defined 

e(k)  =  ^  w(n)  w(n-k)  (12) 

n 

as  the  autocorrelation  of  weight  sequence  {w(n)}  in  (6).  For  the  equl-weight 
sequence  of  (7),  this  function  is  depicted  in  figure  1. 


Now  NA  is  the  total  observation  time  covered  by  the  samples  fx(nA)}  ;  see 
(4) .  Suppose  that 


Na  »  effective  extent  of  RX(D  ;  (13) 

the  right  hand  side  of  (13)  is  also  called  the  correlation  (actually 
covariance)  time  of  process  x(t).  If  (13)  is  satisfied,  then  the  variance  in 
(11)  simplifies  to 


=  var(m)  =  0(0)  Rx(kA)  =  ^  ^Rx(kA)  . 


(14) 
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Thus,  when  the  observation  time  is  much  larger  than  the  correlation  time  of 

the  process,  the  variance  of  m  depends  only  on  the  sum  of  the  sampled 

covariance  values  of  the  process  x(t).  (Result  (9)  follows  easily  from  (11) 

2 

and  figure  1,  since  Rx(0)  =  ox.) 

However,  since  covariance  R^fr)  is  unknown,  we  cannot  determine  the 
right-hand  side  of  (14),  meaning  that  we  do  not  know  the  quality  of  estimate 
rfi.  In  the  next  section,  we  will  address  three  procedures  for  determining  the 
variance  v  of  estimate  m. 
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PROCEDURES  FOR  ESTIMATION 

In  this  section,  we  will  present  three  procedures  for  determining  the 
variance  of  mean  estimate  m  and  discuss  some  of  their  limitations. 

MULTIPLE  PIECES 

This  first  procedure,  which  is  probably  the  most  obvious  one,  consists  of 
dividing  the  available  N  data  points  in  (4)  into  K  abutting  pieces  of  data, 
each  of  size  D,  where 


N  =  K  D  .  (15) 

For  the  k-th  piece,  we  estimate  the  mean  of  x(t)  according  to 

D 

mk  =  x(dA  +  ( k-1 ) DA)  for  1  <  k  <  K  .  (16) 

d=l 

Then  the  average  of  these  K  sub-estimates  is  simply 
K  K  D  N 

kS^K  51  5  x(da  +  (k-1>Da)  =  n  2  x(nA)  =  ™  *  (17) 

k=l  k=l  d=l  n=l 

the  overall  estimate  already  considered  above  in  (6). 

Additionally,  we  now  have  the  option  of  looking  at  the  variability  of  the 
K  random  variables  {m^}  in  (16)  in  order  to  see  how  useful  they  can  be  in 
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'■r\- 


m 


estimating  variance  v  in  (11)— (14) .  Consider  the  quantity  (for  K  >  2) 

K 


A 

V  s 


K(K 


-by  5.  <ak ' ffi)2 


(18) 


k=l 


which  can  be  formed  from  the  K  mean  estimates  in  (16).  It  is  shown  in 
appendix  A  that  v  in  (18)  is  an  unbiased  estimate  of  variance  v  in  (14), 


under  a  condition  on  unknown  covariance  Rx.  Specifically,  we  find  that 


v  =  v  =  var(m)  , 


(19) 


if 


Da  »  effective  extent  of  RX(T) 


(20) 


Since  Da  is  the  individual  segment  length  of  each  data  piece  used  in  (16), 
constraint  (20)  is  a  rather  restrictive  condition.  It  states  that  each 
segment  cannot  be  too  short;  therefore,  the  number  of  pieces,  K,  in  (15)  and 
(18)  cannot  be  too  large.  However,  since  the  stability  of  estimate  v  depends 
inversely  on  K  (see  ( A— 1 3 ) ) ,  satisfaction  of  (20)  may  not  allow  for  a  very 
stable  estimate  v  via  (18). 


What  we  are  doing  here,  by  means  of  (15)— (18),  is  breaking  the  available 
data  record  into  a  number  of  pieces  which  yield  essentially  independent 
estimates  of  mean  m.  The  segment  length,  DA,  being  longer  than  the  effective 


extent  of  covariance  Rx (T)  guarantees  this  property.  If  this  condition, 


(20),  is  not  met,  the  linear  taper  in  (A-3)  affects  the  sum  there  in  an 


a2 


unknown  way  and  can  severely  bias  the  value  of  m1 .  Also,  (20)  is  a  much 
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more  restrictive  condition  than  (13),  since  Da  Is  an  individual  segment 
length,  while  Na  Is  the  total  observation  Interval.  It  would  be  very 
worthwhile  to  employ  a  technique  which  Is  not  subject  to  the  restrictive 
condition  (20);  however,  we  must  be  willing  to  accept  condition  (13). 

If  this  multiple  piece  technique  Is  employed  and  condition  (20)  Is 
satisfied,  then  the  square  root  of  (18),  VS  can  be  used  as  an  estimate  of  the 
standard  deviation  of  mean  estimate  m  In  (6).  This  is  similar  to  the  approach 
in  [1,  appendix  0,  ( D-5 ) ] ,  but  done  more  rigorously  here.  More  details  are 
furnished  in  appendix  A. 

PERIODOGRAM 

The  variance  v  of  mean  estimate  m  In  (6)  is  given  by  (14),  when  condition 
(13)  is  satisfied.  Now  the  spectrum  of  the  ac  component  of  the  sampled 
process  {x(naj}  is  defined  as 

Gx(f)  =  A  Rx(kA)  exp(-i2*fka)  ,  (21) 

k 

where  Rx  is  the  covariance  of  x;  see  (2).  This  quantity  is  related  to  the 
spectrum  of  the  ac  component  of  the  continuous  process  x(t),  namely  Gx(f)  in 
(3),  according  to 

vf)  -  ^  G,(f  -  2) ;  (22> 

n 

that  is,  Sx(f )  is  the  aliased  version  of  Gx(f).  Reference  to  (14)  and 
(21)  reveals  that  variance 
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*  *  k  6x<°>  •  <23) 

Thus,  If  we  can  estimate  the  origin  value  of  the  spectrum  Sx(f)  of  sampled 
process  {x(na)j  ,  we  will  have  an  estimate  of  v. 

Consider,  then,  the  discrete  Fourier  transform  of  the  given  data  sequence: 
N-l 

X(A)  =  x((n+l)A)  exp(-i2irni/N)  for  0  <  i  <  N  -  1  .  (24) 

n=0 

Its  average  power,  at  any  frequency  index.?,  is 
_  N-l 

|XU)|  2  =  [m2  +  Rx(nA  -  PA)  ]  exp(-12ir(n  -  pU/N)  = 

n,p=° 

N 

=  m2N24A0  +  N  (l  -  Rx(ka)  exp(-12*W/N)  ,  (25) 

k=-N 

by  use  of  (2).  Then  for  X*  0,  if  condition  (13)  on  observation  time  Na  is 
observed,  we  have  the  approximation 


|x(jn|  2  ■  N  ^  Rx(ka)  exp(-12irkJI/N)  =  l  Gx(^  , 
k 

upon  use  of  (21).  In  particular 


^  |x«»l 2  ■  k  6x(k)  '»r  A'0  ■ 


(26) 


(27) 
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This  relation  suggests  the  following  procedure  for  estimating  the 
variance  v,  as  given  by  (23):  compute  and  plot  the  periodogram  |x(£)|2  for 
0  <  j(  <  N  -  1,  as  obtained  from  (24);  observe  the  region  where  these  (noisy) 
values  are  approximately  equal  to  the  origin  values,  excluding  X-  0;  average 
these  magnitude-squared  values  over  this  region.  If  this  region  is  1  <  %  <  L, 
say,  then  the  estimate  of  variance  is,  guided  by  (23)  and  (27), 


'-hzl  l*0^2  • 


The  value  of  the  discrete  Fourier  transform  at  the  origin  is  of  no  use  for 
variance  estimation  since  X(0)  =  N  m,  which  involves  the  sample  mean.  Also 
since  the  data  [x(nA)}  have  been  presumed  real  here,  then  X(N  -j ?  )  =  X*(l>), 
and  the  spectral  region  (negative  frequencies)  near  JZ »  N  -  1  contains  no  new 
Information. 


This  procedure  also  employed  condition  (13),  in  the  process  of  deriving 
(26).  However,  the  most  important  limitation  of  this  ad  hoc  procedure  is  the 
arbitrary  decision  about  the  region  L  in  frequency  that  should  be  used  in 
average  (28).  The  wider  the  region,  the  greater  the  stability  in  v;  however, 
too  wide  a  selection  will  result  in  a  biased  estimate  of  variance  v. 
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LINEAR  PREDICTION 


This  procedure  will  be  heavily  based  upon  the  material  In  [3]  on  linear 
predictive  spectral  estimation,  which  the  reader  must  be  familiar  with,  we 
first  observe  from  (21)  that 


a  v°>  -1  yki>  • 


Also,  from  [3;  (14)], 


1  V°»  • 


where  Eq  Is  the  minimum  mean  square  prediction  error  [3;  (8)]  and  filter 
coefficient  aQ  =  -1  [3;  (7)].  Thus,  the  variance  v  in  (23)  can  be  estimated 
If  we  can  estimate  the  quantities  on  the  right-hand  side  of  (30). 


For  unknown  process  covariance,  we  refer  to  [3;  (143)  and  (108)]  to 
obtain  estimate 


l  V°>  ■  ^  - 


5% 


where  Fq  Is  the  minimum  average  magnitude-squared  error  as  given  by 

[3;  (137)  and  (141)],  p  is  the  predictive  filter  order  that  converts  the 

predictive  error  to  a  white  process,  and  jalp  are  the  p-th  order 

^  nj  i 


•mr.  ''''-vww 


predictive  filter  coefficients.  A  comparison  of  (23)  and  (31)  then  leads  to 
variance  estimate 


"5s 


!  n=l  | 

In  terms  of  the  linear  predictive  filter  coefficients  and  associated  error. 
The  selection  of  order  p  Is  somewhat  subjective,  but  can  be  accomplished  by 
use  of  various  Information  criteria  [4]. 

In  order  to  avoid  the  deleterious  effects  of  a  large  mean  component  m  in 
data  {x(nA)}  on  the  filter  coefficients  and  the  origin  spectral  estimate  in 
(31),  the  sample  mean,  m,  must  be  subtracted  from  the  given  data  before  the 
linear  predictive  approach  Is  applied. 


SUMMARY  AND  DISCUSSION 


Sectioning  of  the  available  data  Into  a  number  of  disjoint  pieces 
requires  that  the  length  of  each  Individual  piece  be  much  longer  than  the 
correlation  time  of  the  process.  For  a  given  total  data  record,  this  limits 
the  number  of  pieces  that  can  be  employed  to  get  a  stable  estimate  of  the 
variance  of  the  mean  estimate.  Accordingly  this  technique  Is  not  recommended 
unless  a  very  large  data  record  Is  available  and  stability  Is  not  an  issue. 

The  perlodogram  technique  requires  an  ad  hoc  (eyeball)  decision  as  to  the 
region  near  the  frequency  origin  where  the  average  of  bin  powers  should  be 
conducted.  Other  than  this.  It  only  sequlres  that  the  total  data  record 
length  NA  be  much  larger  than  the  correlation  time  of  the  process;  this 

condition  Is  a  reasonable  one  and  will  be  required  of  any  technique  attempting 

to  extract  a  stable  estimate  of  the  variance  of  m. 

The  linear  predictive  approach  Is  also  subject  to  this  condition  on  total 
observation  time,  since  pA  Is  the  duration  of  the  Impulse  response  of  the 

linear  predictive  filter,  and  this  must  be  short  relative  to  NA,  in  order  to 

get  stable  estimates  of  filter  coefficients  {an}.  However,  we  can  utilize 
both  a  forward  as  well  as  a  backward  average  [3]  for  additional  stability. 
Also,  there  Is  an  automatic  Indication  of  whether  pA  «  Na,  simply  by 
observing  when  the  predictive  error  saturates  as  the  filter  order  Increases; 
that  Is,  the  filter  order  need  not  be  taken  any  larger  than  necessary,  thereby 
preserving  the  quality  of  the  resulting  estimates. 
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If  the  process  x(t)  under  investigation  is  complex,  the  current  analyses 
still  apply,  with  some  generalizations.  Thus,  the  second  bracketed  term  in 

(2)  for  the  covariance  must  be  conjugated,  and  the  variance  in  (11)  must  be  a 

magnitude-squared  error.  The  linear  predictive  procedures  in  [3,  4]  were,  in 
fact,  carried  out  for  complex  processes  and  so  apply  directly. 

These  techniques  also  afford  an  efficient  method  of  assigning  an 

equivalent  number,  Ng,  of  Independent  samples  to  a  given  data  set,  (4).  The 

variance  of  m  is  given  In  (14),  while  the  variance  of  a  set  of  independent 

samples  is  given  by  (9a).  If  we  equate  (14)  to  the  variance  which  pertains  to 

equivalent  number  N  ,  we  have 

e 


or 


Rx(0)  . 


(33) 


N 


e 


=  N 


»,<°> 

2  Rx‘k‘>  ' 
k 


(34) 


Since  the  quantities  on  the  right  hand  side  are  unknown,  we  replace  them  by 
estimates  obtainable  from  the  data.  In  particular,  using  the  linear 
predictive  results  in  (31),  we  are  led  to  estimate 


A 


=  N 


R  (0) 

x__ 


(35) 
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APPENDIX  A.  DERIVATIONS  FOR  MULTIPLE  PIECES  TECHNIQUE 

In  this  appendix,  we  concentrate  on  the  technique  described  In 
(1 5)— (18) .  The  estimate  v  In  (18)  can  be  expanded,  resulting  in 

K 


v  = 


K(K 


A  /  .  A  A  A/. 

mf  -  2  m  t  +  in  )  * 


k=l 


1 


K(K  -  1) 


m.  -  K  m 


(A-l) 


k*l 


where  we  employed  (17).  The  mean  of  the  random  variable  in  (A-l)  is  then 


0  - 


1  /  £2  aA 

~  (^mi " m ;  • 


(A-?) 


where  is  taken  as  a  representative  mean  estimate  from  the  set  in 
(16).  But  from  (16),  the  quantity 


ft,2  - 

ml  ' 


l  2  X(dA)  ]  =  ^  2  *(^rx(3 1)  = 


d=l 


d,j-l 


\  (m2  +  Rx(do  -  jA))  =  m2  +  ^  2  Rx(da  "  JA> 

0  d,  j=l  D  d,j=l 


-W 


k=-0 


('  -  -o1)  R> 


(kA)  , 


(  A— 3) 


by  use  of  (2) . 
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Now  If  the  number  of  data  points,  D,  In  each  piece  (16),  is  taken  large 
enough  that 


DA  »  effective  extent  of  Rx(T)  . 


( A— 4 ) 


then  (A-3)  simplifies  to 

m2  a  m2  +  1  ^  Rx(kA)  .  (A-5) 

k 

The  corresponding  result  for  m2  is  immediately  obtained  by  replacing  D  by  N 
everywhere,  getting 

m2  =  m2  +  J  '  (A-6) 

k 

When  we  employ  these  last  two  relations  In  (A -2).  there  follows 

0  *  rh-  (o  -  s)  2  «,(k«  ■  5  -  *  •  (*-■») 

k  k 

where  we  also  used  (15),  N  »  KD,  and  (14).  Thus,  estimate  v  in  (18)  is  an 
unbiased  estimator  of  v  =  var(m),  as  claimed.  No  Gaussian  assumptions  have 
been  utilized  anywhere. 


To  minimize  the  variance  of  random  variable  v  itself,  as  given  by  (18), 
we  should  take  K  large.  However,  for  a  limited  data  set  of  size  N,  this  will 
decrease  D  according  to  D  =  N/K.  Thus,  K  can  only  be  increased  to  the  point 
where  (A-4)  would  begin  to  be  violated.  If  we  satisfy  (A-4),  then  we  can  use 
as  an  estimate  of  the  standard  deviation  of  m.  We  do  not  need  or  utilize 
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detailed  knowledge  of  covariance  RX(D,  except  for  the  condition  (A-4); 
however,  this  may  be  a  very  restrictive  condition. 

To  demonstrate  that  large  K  is  beneficial  for  estimate  v  in  (18),  we  will 
derive  its  variance  under  the  assumption  that  random  variables  {m^}  in  (16) 
are  independent  Gaussian  random  variables.  This  Is  a  reasonable  assumption  if 
D  is  large,  as  already  presumed.  (If  data  {x ( n A )}  were  Gaussian,  that 
would  guarantee  the  Gaussian  character  of  {mk},  since  they  are  obtained  via 
linear  operations  on  the  data;  see  (16).} 


Using  (17),  we  express  random  variable  (A-l)  as 


A 

V 


1 

K(K  -  1) 


1 


K(K  -  1) 


M1  Q  M 


(A— 8) 


where  matrices 
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We  already  know  that 


mk  =  m 


var^k)  -  D  2  Rx(kA)  =  D  * 


from  (16)  and  ( A— 5 ) . 


(A-10; 


We  now  employ  [2];  in  particular,  see  (15)-(17).  Then  [2;  (30)]  gives, 
with  identifications 


N  -»  K,  y  -»  1,  P  -»  1,  au  =  av  +  . 


(A-i  i : 


the  results 


A  _  1 
V  =  K(K  -  1) 


-  ’>  0  -  KD  -  S  2 
k 


Rx(ka) 


and 


var(v)  = 


1 

K2(K  -  l)2 


Rv(kA) 


( A-l  2 


(A-i3: 


The  first  result  checks  (A-7),  as  expected.  The  second  result  involves  only 
one  free  parameter,  namely  K,  and  is  minimized  by  choosing  K  as  large  as 
possible,  but  always  subject  to  constraint  (A-4). 
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Observe  that  quality  ratio 


var(v) 


K  -  1 
2 


( A— 1 4 ) 


In  terms  of  the  "number  of  independent  pieces,"  K,  employed  in  (16),  and  is 
Independent  of  covariance  R^Cr). 
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